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Abstract 

We study the correctability of efficiently samplable errors. Specifically, we consider the set¬ 
ting in which errors are efficiently samplable without the knowledge of the code or the transmit¬ 
ted codeword, and the error rate is not bounded. Assuming the existence of one-way functions, 
there are samplable pseudorandom distributions that are not correctable by efficient coding 
schemes. We show that there is an oracle relative to which there is a samplable flat distribu¬ 
tion over {0,1}" of entropy m that is not pseudorandom, but uncorrectable by efficient coding 
schemes of rate less than 1 — m/n ~ w(logn/n). The result implies that correcting samplable 
additive errors is difficult even when they are not pseudorandom, and low-rate coding schemes 
are employed. We also show that the existence of one-way functions is necessary to derive im¬ 
possibility results for coding schemes of rate less than 1—m/n that correct flat distributions of 
entropy to. 


1 Introduction 

In the theory of error-correcting codes, two of the most-studied channel models are probabilistic 
channels and worst-case channels. In probabilistic channels, errors are introduced through stochas¬ 
tic processes, and the most well-known one is the binary symmetric channel (BSC). In worst-case 
(or adversarial) channels, errors are introduced adversarially by considering the choice of codes 
and transmitted codewords under the restriction of the error rate. In his seminal work [20], Shan¬ 
non showed that reliable communication can be achieved over BSC if the coding rate is less than 
1 — Hi(p), where Hi{-) is the binary entropy function and p is the crossover probability of BSC. 
In contrast, it is known that reliable communication cannot be achieved over worst-case channels 
when the error rate is at least 1/4 unless the coding rate tends to zero [19]. 

If we view the introduction of errors as computation of the channel, probabilistic channels 
perform low-cost computation with little knowledge about the code and the input, while worst- 
case channels perform high-cost computation with the full-knowledge. As intermediate channels 
between probabilistic channels and worst-case channels, Lipton [16] introduced computationally- 
bounded, channels , where errors are introduced by polynomial-time computation. He showed that 
reliable communication can be achieved at the coding rate less than 1 — Hi{p) in the shared 
randomness setting, where p < 1 is the error rate, which is the fraction of errors introduced by 

*Preiiminary version of parts of the work appeared in the Proceedings of the 2014 IEEE International Symposium 
on Information Theory. 
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the channel. Micali et al. [18] presented reliable coding schemes in the public-key infrastructure 
setting. Guruswami and Smith [10] showed reliable coding schemes without assuming the shared 
randomness or the public-key infrastructure. Note that these work [16, 18, 10] consider the settings 
in which channels are computationally-bounded and the error rate is bounded. 

In this work, we also focus on computationally-bounded channels. In particular, we consider 
samplable additive channels, in which errors are sampled by efficient computation and added to the 
codeword in an oblivious way. More precisely, errors are sampled by a probabilistic polynomial¬ 
time algorithm, but the algorithm does not depend on the choice of the code or the transmitted 
codeword. This is stronger than the standard notion of obliviousness, where an oblivious channel 
can depend on the code, but not the codeword (cf. [15]). 

Furthermore, in this work, we consider samplable additive channels with unbounded error rate. 
Namely, the error rate p is not a priori bounded. Although most of the work in the literature 
focuses on bounded error-rate settings, this restriction might not be necessary for modeling errors 
generated by nature as a result of polynomial-time computation. We believe it is worth studying 
unbounded error-rate settings since exploring the correctability in unbounded error-rate settings 
can reveal what error structures can help to achieve error correction. In particular, the study on 
samplable additive errors can reveal what computational structures of errors are necessary to be 
corrected. 

Samplable additive channels are relatively simple channel models since the error distributions 
are identical for every coding scheme and transmitted codeword. The binary symmetric channel 
is an example of samplable additive channels. Thus, we consider the setting in which coding 
schemes can be designed with the knowledge of the error distribution that is generated by an 
efficient algorithm. This setting is incomparable to previous notions of error correction against 
computationally-bounded channels. Our model is stronger because we do not restrict the error 
rate, but is weaker because the channel cannot see the code or the transmitted codeword. 

1.1 Our Results 

We would like to characterize samplable additive channels regarding the existence of efficient reliable 
coding schemes. We use the entropy of the error distributions as a criterion. The reason is that, 
if the entropy is zero, it is easy to achieve reliable communication since the error is a fixed string 
and this information can be used for designing a reliable coding scheme. On the other hand, if 
the error distribution has the full entropy, we could not achieve reliable communication since the 
truly random error will be added to the transmitted codeword. Thus, there seem to be bounds 
on the existence of efficient reliable coding schemes depending on the entropy of the underlying 
error distribution. When reliable coding schemes exist, an important quantity of the scheme is the 
information rate (or simply rate), which is the ratio of the message length to the codeword length. 
We investigate the bounds on the rate when reliable communication is achievable. 

Let Z be an error distribution over {0, l} n associated with a samplable additive channel, and 
H(Z) the Shannon entropy of Z. Note that for a flat distribution, which is a uniform distribution 
over its support, H(Z) is equal to the min-entropy of Z. 

Basic observations. First, we observe several basic facts regarding the correctability of sam¬ 
plable additive errors. 

Let consider flat distributions Z. It follows from a probabilistic argument that for any flat 
Z, there is a linear code that corrects Z with error e for rate R < 1 — H(Z)/n — 21og(l/e)/n. 
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The decoding complexity is 0(n 2 2 H ^ Z ' ) ). Thus, if H(Z) = O(logn), the code can correct Z in 
polynomial time. Conversely, by a simple counting argument, it holds that every flat distribution 
Z is not correctable with error e for rate R > 1 — H(Z)/n + log(l/(l — e))/n. In addition, we 
observe that it is difficult to construct a code that corrects the family of flat distributions with the 
same entropy. Specifically, we show that for every code of rate k/n and every m < k, there is a flat 
distribution Z with H(Z) = m that is not correctable by the code. 

A positive result can be obtained if we consider much more structured errors. If the error 
vectors form a linear subspace, there is an efficient coding scheme that corrects them by syndrome 
decoding with optimal rate R = 1 — m/n, where m is the dimension of the linear subspace. 

Regarding efficient coding schemes, we observe that if errors are pseudorandom (in the crypto¬ 
graphic sense), then efficient coding scheme cannot correct them. This implies that assuming the 
existence of one-way functions, there exist Z with H(Z) = n e for 0 < e < 1 that are not efficiently 
correctable. 

Errors with membership test. To avoid the impossibility of correcting pseudorandom errors, 
we consider samplable distributions for which membership test can be done efficiently. Such dis¬ 
tributions are not pseudorandom since the membership test can be used to distinguish them from 
the uniform distribution. 

We show the existence of an uncorrectable distribution with membership test for low-rate codes. 
Specifically, we show that there is an oracle relative to which there exists a samplable distribution 
Z of entropy w(logn) that is not correctable by efficient coding schemes of rate R < 1 — H{Z)/n — 
w(\ogn/n). The result complements the impossibility of correcting flat distributions for rate R > 
1 — H(Z)/n + 0(l/n). Also, the entropy of w(logn) is optimal since, as in the above observations, 
there is a probabilistic construction of a code that corrects Z in polynomial time if H(Z) = 0(log n). 

To derive this result, we use the technique of Wee [25], which is based on the reconstruction 
paradigm of Gennaro and TYevisan [8]. We use his technique for the problem of error correction. We 
show that if a samplable distribution with a sampler S is efficiently correctable, then the function 
of S has a short description, and thus, by a counting argument, efficient coding schemes cannot 
correct every samplable distribution with membership test. 

This negative result seems counterintuitive. In general, constructing low-rate codes seems to 
be easier than high-rate codes. However, the result implies the impossibility of constructing low- 
rate codes. The reason for such a result is that the reconstruction paradigm crucially uses the 
fact that some function can be described shortly. In our case, we use the fact that functions for 
samplable errors can be described shortly if the errors are correctable by coding schemes with short 
descriptions. Since low-rate codes have short descriptions, the result can be applied to low-rate 
codes. 

Necessity of one-way functions. Finally, we show that it is difficult to prove unconditional 
impossibility results for coding schemes of rate R < 1 — H(Z)/n. Specifically, we show that if 
one-way functions do not exist, then any samplable flat distribution Z is correctable by an efficient 
coding scheme of rate 1 — H(Z)/n — 0(\ogn/n). Thus, it is necessary to assume the existence of 
one-way functions or oracle access to derive impossibility results for rate R < 1 — H(Z)/n. 

The results are summarized in Table 1, where R denotes the rate of coding schemes. 
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Table 1: Correctability of Samplable Additive Error Z 


H(Z) 

Correctabilities 

Assumptions 

References 

0 (log re) 

Efficiently correctable with error e for 

p/ 1 H(Z) 21og(l/e) 

n n 

No 

Proposition 3 


3 Z with membership test, 



w(log re) 

not efficiently correctable for 

Oracle access 

Corollary 1 





(0 < e < 1) 

3 Z not efficiently correctable for any R 

OWF 

Proposition 7 

1 < m < k 

V code with R = k/n, 

3 Z not correctable by the code 

No 

Proposition 5 

— 

V linear subspace Z of dimension rre, 

3 code correcting Z with R < 1 — — 

No 

Proposition 6 


V flat Z, 




(1) 3 (non-explicit) code correcting Z 

t r> / i H(Z) 2log(l/e) 

with error e tor K < 1- 1 - ' ’ 

No 

Proposition 3 


(2) not correctable with error e for 

p ^ i H(Z) log(l/(l-e)) 

n n 

No 

Proposition 4 

— 

V flat Z is efficiently correctable for 

R< 1 H [ z) oCT ) 

No OWF 

Theorem 3 


1.2 Related Work 

The notion of computationally-bounded channel was introduced by Lipton [16]. He showed that 
if the sender and the receiver can share secret randomness, then the Shannon capacity can be 
achieved for this channel. Micali et al. [18] considered a similar channel model in a public-key 
setting, and gave a coding scheme based on list-decodable codes and digital signature. Guruswami 
and Smith [10] gave constructions of capacity achieving codes for worst-case additive-error channel 
and time/space-bounded channels. In their setting of additive-error channel, the error rate is 
bounded, and the errors are only independent of the encoder’s random coins. They also gave 
strong impossibility results for bit-fixing channels, but their results can be applied to channels that 
use the information on the code and the transmitted codewords. In this work, we give impossibility 
results even for channels that do no use such information. 

Samplable distributions were also studied in the context of data compression [9, 22, 25], random¬ 
ness extractor [21, 24, 4], and randomness condenser [5]. Samplable distributions with membership 
test appeared in the study of efficient compressibility of samplable sources [9, 22, 25]. 

2 Preliminaries 

For n £ N, we write [re] as the set {1,2,..., re}. For a distribution X, we write x ~ X to indicate 
that x is chosen according to X. We may use X also as a random variable distributed according 
to X. The support of X is Supp(A) = {x : Pr^x) ^ 0}, where Rrx{x) is the probability that X 
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assigns to x. The Shannon entropy of X is H(X) = E x ^x[~ log Prx-(aj)]. The min-entropy of X is 
given by min. rg g U pp( j Y){ — log Prx(x)}. It is known that the min-entropy of X is a lower bound on 
H(X). A flat distribution is a distribution that is uniform over its support. For flat distributions, 
the Shannon entropy is equal to the min-entropy. Thus, we simply say that a flat distribution Z 
has entropy m if its Shannon entropy is m. For n £ N, we write U n as the uniform distribution 
over {0, l} n . 

We define the notion of additive-error correcting codes. 

Definition 1 (Additive-error correcting codes). For two functions Enc : F fe —» F n and Dec : F n —» 
F k , and a distribution Z over F n , where F is a finite field, we say (Enc, Dec ) corrects (additive 
error) Z with error e if for any x € F k , we have that Pr zr ^z[Dec(Enc(x) + z) x] < e. The rate of 
(Enc, Dec) is k/n. 

Definition 2. A distribution Z is said to be correctable with rate R and error e if there is a pair 
of functions (Enc, Dec) of rate R that corrects Z with error e. 

We call a pair (Enc, Dec) a coding scheme or simply code. The coding scheme is called efficient 
if Enc and Dec can be computed in polynomial-time in n. The code is called linear if Enc is a linear 
mapping, that is, for any x, y £ F n and a, b £ F, Enc(ax + by) = a Enc(x) + b Enc(y). If |F| = 2, we 
may use {0,1} instead of F. 

Next, we define syndrome decoding for linear codes. 

Definition 3. For a linear code (Enc, Dec), Dec is said to be a syndrome decoder if there is a 
generator matrix G £ F' Rnxn and a function Rec : —>• F” such that Dec(y) = (y — Rec(y ■ 

H T )) ■ G^ 1 , where Enc(x) = x ■ G for all x £ F R?1 , G~ l € F nxRn is a right inverse matrix of G 
(i.e., GG -1 = I), and H £ J?( n - fin ) xn i s a d ua l matrix of G (i.e., GH 1 = Oj. 

We consider a computationally-bounded analogue of additive-error channels. We introduce the 
notion of samplable distributions. 

Definition 4. A distribution family Z = {Z n } ng ^ is sa id to be samplable if there is a probabilistic 
polynomial-time algorithm S such that S(l n ) is distributed according to Z n for every n € N. 

We consider the setting in which coding schemes can depend on the sampling algorithm of 
Z, but not on its random coins, and Z does not use any information on the coding scheme or 
transmitted codewords. In this setting, the randomization of coding schemes does not help much. 

Proposition 1. Let (Enc, Dec) be a randomized coding scheme that corrects a distribution Z with 
error e. Then, there is a deterministic coding scheme that corrects Z with error e. 

Proof. Assume that Enc uses at most t'-bit randomness. Since (Enc, Dec) corrects Z with error e, 
we have that for every x € F k , Px z ~z,r~u t [Dec(Enc(x; r) + z) x\ < e. By the averaging argument, 
for every x £ F k , there exists r x £ {0,1}^ such that Pr z ^z[Dec(Enc(x; r x ) + z) x\ < e. Thus, 
by defining Enc ; (x) = Enc(x; r,,;), the deterministic coding scheme (Enc 7 , Dec) corrects Z with error 

e. □ 

The fact that the randomization does not help much is contrast to the setting of Guruswami 
and Smith [10], where the channels can use the information on the coding scheme and transmitted 
codewords, but not the random coins for encoding. They present a randomized coding scheme 
with optimal rate 1 — ^(p) fc> r worst-case additive-error channels, for which deterministic coding 
schemes are only known to achieve rate 1 — H 2 ( 2 p), where p is the error rate of the channels. 
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3 Basic Properties of Samplable-Additive Errors 


We present several basic facts regarding the correctability of samplable additive errors. Although 
the claims in this section are elementary or folklore, we include the proofs for completeness. 


3.1 Errors from Flat Distributions 

For any flat distribution Z, a random linear code can correct Z with high probability. Consider a 
random linear code of rate R such that the parity check matrix H is chosen uniformly at random 
from Rr — {0, i}( n_ - R?1 ) XTl . The decoding is done in a brute-force way, namely, for a received word 
y, find x £ {0, l} fin and £ € Supp(Z) such that y = x ■ G + z, and output x, where G is a generator 
matrix for H. 

Proposition 2. For any flat distribution Z over {0, l} n , a random linear code from Hr corrects 
Z with error 2~^ n ~ Rn ^ H ^ z L. 


Proof. It is sufficient to show that, for a random H from Rr, every z £ Supp(Z) has a unique 
syndrome 2 ■ H T with high probability. For each z £ Supp(Z), 

Pr l^z' € Supp(Z) \ {z} : z ■ H T = z' ■ H t ] 
h&h r 1 J 

= Pr [z \z' £ Supp(Z) \ {z} : Vi G [n — Rn], hi ■ (z — z') = 0] (1) 

H^Hr 

z n pj 1} ,[!“■ =°] 

2/ESupp(Z)\{2:} i£[n—Rn] 

2 — (n—Rn—H(Z)) 


where H 1 = (/if,..., h^_ Rn ) in (1), the last inequality follows from the fact that z 
|Supp(Z)| = 2 h ( z ^ for flat Z. 


z' 0 and 

□ 


By Proposition 2, for a (1 
corrects Z with error at most 2 


2 ( n Rn ^( z ^/ 2 )-fraction of H in Rr, the corresponding code 
(n-Rn-H(Z))/2. Thus, we have the following proposition. 


Proposition 3. Let Z be any flat distribution over {0, l} n of entropy m. There is a linear code of 
rate R that corrects Z with error e for R < 1 — m/n — 21og(l/e)/n. The decoding complexity is at 
most 0(n 2 2 m ). 


Proof. The existence of such a code immediately follows from the above argument. Given a received 
word y, the brute-force decoder checks if (y — z) ■ H T = 0 for all 2 £ Supp(Z), where H is the 
parity check matrix. If so, output x satisfying x ■ G = y — z. Thus, the decoding is done in time 
0(n 2 ) ■ |Supp(Z)|. ^ □ 

Proposition 3 implies that for any flat Z of entropy O(logn), there is a code that corrects Z in 
polynomial time. Although the construction is not fully explicit, we can obtain such a code with 
high probability. 

Conversely, we can show that the rate achieved in Proposition 3 is almost optimal. 

Proposition 4. Let Z be any flat distribution over {0, l} n of entropy m. If a code of rate R 
corrects Z with error e, then R < 1 — m/n + log(l/(l — e))/n. 
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Proof. Let (Enc, Dec) be a code that corrects Z with error e. For x £ {0, l} Rn , define D x = {y £ 
{0,1}" :Dec(y) = x}. That is, D x is the set of inputs that are decoded to x by Dec. Since the code 
corrects the flat distribution Z with error e, \D X \ > (1 — e)2 m for every x £ {0, \} Rn . Since each 
D x is disjoint, X^Gfo \} Rn |Ar| — 2 n . Therefore, we have that (1 — e)2 m ■ 2 Rn < 2 n , which implies 
the statement. □ 

By Proposition 3, one may hope to construct a single code that corrects errors from any flat 
distribution with the same entropy, as constructed in [2] for the case of binary symmetric channels 
by using Justesen’s construction [14]. However, it is impossible to construct such codes. We show 
that for every deterministic coding scheme of rate k/n, there is a flat distribution Z of entropy 
m that is not correctable by the scheme for any 1 < m < k. By combining this result with 
Proposition 1, we can conclude that there is no coding scheme of rate k/n that corrects every flat 
distribution of entropy m with 1 < m < k. 

Proposition 5. For any deterministic code of rate k/n and any m with 1 < m < k, there is a flat 
distribution of entropy m that is not correctable by the code with error e < 1/2. 

Proof. Define a flat distribution to be a uniform distribution over 2 m distinct codewords c\ ,..., c ■ 
If the input to the decoder is q + Cj for i,j £ [2 m ], the decoder cannot distinguish the two cases 
where the transmitted codewords are Cj and Cj. Thus, the decoder outputs the wrong answer with 
probability at least 1/2 for at least one of the two cases. □ 

3.2 Errors from Linear Subspaces 

Let Z = { 21 , 22 , • • • , 2 m } C F n be a set of linearly independent vectors. We can construct a linear 
code that corrects additive errors from the linear span of Z. 

Proposition 6. There is a linear code of rate 1—m/n that corrects the linear span of Z by syndrome 
decoding. 

Proof. Consider n — m vectors w m+ i ,..., w n £ F n such that the set { 21 , 22 ,..., z m ,w Tn+ 1 ,..., w n } 
forms a basis of F n . Then, there is a linear transformation T : F n —> F m such that T(zf) = 
and T(wf) = 0, where e* is the vector with 1 in the i-th position and 0 elsewhere. Let H be the 
matrix in F mxn such that xH T = T(x), and consider a code with parity check matrix H. Let 
2 = a i z i be a vector in the linear span of Z, where a, £ F. Since 2 • H 1 = (X/;=i a i z i )' H 1 = 
YliLi a i e i = ( a ij • • •! ®m): the code can correct the error 2 by syndrome decoding. Since H £ F mxn 
is the parity check matrix, the rate of the code is (to — m)/n. □ 

3.3 Pseudorandom Errors 

We show that no efficient coding scheme can correct pseudorandom errors. 

Proposition 7. Assume that a one-way function exists. Then, for any 0 < e < 1, there is a 
samplable distribution Z over {0, l} n such that H(Z) < n e and no polynomial-time algorithms 
(Enc, Dec ) can correct Z. 

Proof. If a one-way function exists, there is a pseudorandom generator G : {0,1}” £ —> {0,1}” 
secure for any polynomial-time algorithm [12]. Then, a distribution Z = G(U n t) is not correctable 
by polynomial-time algorithms (Enc, Dec). If so, we can construct a polynomial-time distinguisher 
for pseudorandom generator by employing (Enc, Dec), and thus a contradiction follows. □ 
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4 Errors with Membership Test 


Since pseudorandom distributions are not correctable by efficient schemes, we investigate the cor- 
rectability of distributions that are not pseudorandom. For such distributions, we consider distribu¬ 
tions for which the membership test can be done efficiently. A distribution Z is called a distribution 
with membership test if there is a polynomial-time algorithm D such that D(z ) = 1 z G Supp(Z). 
Since the algorithm D can distinguish Z from the uniform distribution, Z is not pseudorandom. 

We show that there is an oracle relative to which there exists a samplable distribution with 
membership test that is not correctable by efficient coding schemes with low rate. 

Let N = 2 n ,K = 2 k ,M = 2 m . Let T be the set of injective functions / : {0, l} m —> {0, l} n . 
For each / G J, define an oracle Of such that 


O f (b,y) 


' O s f (y ) if 6 = 0, yG {0, l} m 
< Of{y) if 6 = 1, ye {0,1}" , Of{y) 
_L otherwise 


1 if 9 e /({o, l} m ) of 
o if 9 i /({(I,l) ,y 1 


f(y)- 


Let correctf be the set of functions / e J- for which there exist oracle circuits (Enc, Dec) that 
make q queries to oracle Of and correct f(U m ) with rate k/n. For each / e J- and the corresponding 
(Enc, Dec), we define 

invert/ = {y e {0, l} m : for any x € {0, l} fc , on input Enc(x) + /(y), 

Dec queries O’J on y}, 

forge/ = {y G {0, l} m : for some x G {0, l} fc , on input Enc(x) + /(y), 

Dec does not query Of on y}. 

Note that invert/ and forge/ is a partition of {0, l} m . We also define 

invertible = {/ G correctf : |invert /1 > e ■ 2 m }, 
forgeable = {/ G correctf : |forge/ 1 > 5 ■ 2 m }, 

where e and 5 are any positive constants satisfying e + 6 = 1. Note that correctf = invertible U 
forgeable. 

Intuitively, if / is in invertible, then there is a small circuit that inverts /. This is done by 
computing Enc(x) + /(y) and monitoring oracle queries that Dec(Enc(x) + /(y)) makes to O l J. 
Since a random function is one-way with high probability, we can show that the size of invertible 
functions, i.e., invertible, is small. Similarly, if / is in forgeable, then Dec corrects /(y) without 
querying Of on y. This means that /(y) can be described using Dec and Enc(x) + /(y), and thus 
if Enc(x) + /(y) has a short description, the size of forgeable is small. 

To argue the above intuition formally, we use the reconstruction paradigm of [8]. Then, we 
show that both invertible and forgeable are small. 

First, we show that / G invertible has a short description. 

Lemma 1. Take any f G invertible and the corresponding pair of oracle circuits ( Enc , Dec ) that 
makes at most q queries to Of in total and corrects f(U m ) with rate k/n. Then f can be described 
using at most 

tog ((() + tog + tog ( (0 : {) (M - c)! 
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bits, given (Enc, Dec), where c = eM/q. 

Proof. First, consider an oracle circuit A such that, on input z, A picks any x £ {0, l} fc and 
simulates Dec on input Enc(x) + z. Then, for any y £ invert/-, on input f(y), A outputs y by 
making at most q queries to Of. 

Next, we show that for any / £ invertible, / has a short description given A. Without loss of 
generality, we assume that A makes distinct queries to Of. We also assume that on input f(y), A 
always queries Of on y before it outputs y. We will show that there is a subset T C /(invert/-) such 
that / can be described given T, B(T), f |{ 0 i} m \-B(T)> where B(T) = {y £ {0, l} m : y £- A{z),z £ 
T}. ' 

We describe how to construct T below. 

Construct-T: 

1. Initially, T is empty, and all elements in T* = /(invert/) are candidates for inclusion in T. 

2. Choose the lexicographically smallest z from T*, put z in T, and remove z from T*. 

3. Simulate A on input 2 , and halt the simulation immediately after A queries O'J on y. Let 
y[,... ,y' p be the queries that A makes to Of, where y' p = y and p < q. 

• Remove f(y[),..., f(y' p _i) from T*. (This means that these elements will never belong 
to T, and in simulating A(z) in the recovering phase, the answers to these queries are 
made by using the look-up table for /.) 

• Continue to remove the lexicographically smallest z from T* until we have removed 
exactly q — 1 elements in Step 3. 

4. Return to Step 2. 

Next, we describe how to reconstruct / from T, B(T), and /|{o,i} m \R(T)- We show how to 
recover the look-up table for / on values in B(T). 

Recover-/: 

1. Choose the lexicographically smallest element z £ T, and remove it from T. 

2. Simulate A on input z, and halt the simulation immediately after A queries O’J on y for 
which the answer does not exist in the look-up table for /. Since the query y satisfies that 
y = f~ l {z), add the entry (y,z) to the look-up table. 

In what follows, we explain why we can correctly simulate A(z). 

• Since B(T) and /|{o,i} m \B(T) are given, we can answer all queries to Oj 1 . 

• For any query y' to Of, it must be either (1) y' ^ B(T ), or (2) y' is the output of A on 
input z' such that z’ £ W and z' is lexicographically smaller than z. In either case, the 
look-up table has the corresponding entry, and thus we can answer the query. 

3. Return to Step 1. 

In each iteration in Construct-T, we add one element to T and remove exactly q element from 
T*. Since initially the size of T* = /(invert/-) is eM, the size of T in the end is c = eM/q. 

The sets T and B(T), and the look-up table for /|{o,i} m \s(T) can be described using log ((f) , 
log (Y), and log((^/f)(M — c)!), respectively. Therefore, the statement follows. □ 
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We show that the fraction of / £ T for which / £ invertible and f(U m ) is correctable is small. 

Lemma 2. If m > 3logs + logn + 0(1), then the fraction of functions f € T such that f £ 
invertible and f(U m ) can be corrected by a pair of oracle circuits ( Enc, Dec ) of total size s is less 
than 2 _ ( snlo e s+1 ) for all sufficiently large n. 


Proof. It follows from Lemma 1 that, given (Enc, Dec), the fraction is 


invertible| < (?)(") (£3 (M - c)! _ («) 




(£)«! 


where c = eM/(qK). By using the fact that q < s and the inequalities < ( ff) k and n\ > (f)”, 
the expression is upper bounded by 


eM 

c 


c / e 2 q 2 \ eM/g / 1 \ ns log s+1 

= ) 


<l 2 


for all sufficiently large n. The last inequality follows from the fact that 


e 2 q 2 e 2 q 2 1 

e 2 M < e 2 fl(s 3 n) < 2 


eM efl(s 3 n) 

-> -> ns log s + 1. 

q q 


□ 


Next, we show that forgeable has a short description. 


Lemma 3. Take any f £ forgeable and the corresponding pair of oracle circuits (Enc, Dec) that 
make at most q queries to Of in total and corrects f(U m ) with rate k/n. Then f can be described 
using at most 

bg CO + l0g ((m-0 (M “ ^ ! ) + d ( k + m + lo §^) 

bits, given (Enc, Dec), where d = 5M/q. 


Proof. First, consider an oracle circuit A such that, on input w, A obtains x by simulating Dec on 
input w, queries Oj 1 on w— Enc(x), and outputs + if (w — Enc(x)) = 0, and x otherwise. Then, 
A satisfies that, on input w, A outputs + if w Enc({0, l} fc ) + /({ 0, l} m ), and Dec(u;) otherwise. 

Next, we show that for any / £ forgeable, / has a short description given A. Without loss 
of generality, we assume that A makes distinct queries to Of and Oy. We also assume that for 
x £ {0, l} k and y £ {0, l} m , A(Enc(x) + f (y)) always queries Oy on f(y) before it outputs x. Note 
that for y £ forge^, there is some x £ {0, l} fc such that, on input Enc(x) + f(y), A does not query 
O ^ on y. 

We will show that there is a subset Y C forge^ such that / can be described given Y, /|{o,i} m \y, 
and {(x y ,a y , b y ) £ {0, l} fc x [ M] x [g] : y £ Y} of a set of advice strings. For x £ {0, l} k , we define 
D(x) = (Enc(x) + f(y) : y £ {0, l} m }. Note that \D(x)\ = M for any x £ {0,1}^. 

We describe how to construct Y below. 

Construct-T: 
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1. Initially, Y is empty. All elements in Y* = forge f are candidates for inclusion in Y. For every 
x G {0, l} k , set D x = {Enc(x) + f(y) : y G forgey}. We write V k = Uxe{o,i} fc D '*■ 

2. Choose the lexicographically smallest y from Y*, put y in Y , and remove y from Y*. 

3. Choose the lexicographically smallest w from the set of Enc(a:) + f{y) G D x such that A does 

not query Of on y. If w = Enc(x) + f(y), set x y = x. Then, for every x' G {0, l} fc , remove 
Enc(x / ) + f(y) from D x j. (This removal means that hereafter there are no elements in V k for 
which A outputs some x such that f(y) is the error vector.) When w is the lexicographically 
f-th smallest element in D(x), set a y = t (so that we can recognize that the a y -th element in 
D(x) is w in the recovering phase). 

4. Simulate A on input w, and halt the simulation immediately after A queries Oj 1 on f(y). 

Let y[,...,y' p be the queries that A makes to Op and z [,..., z' r = f(y) be the queries that 

A makes to Oj 1 . Set b y = r (so that we can recognize that the b y -th query that Dec makes 

to Of 1 is f(y) in the recovering phase). 

(a) For every x' G {0, l} fc , remove Enc(x') + f(y [),..., Enc(x') + f(y' p ) from D x t. 

(b) For every i G [p], if z\ G /(forge^), then for every x' G {0, l} fc , remove Enc(x') + z\ from 
D x i , and otherwise, do nothing. 

(c) Continue to remove the elements Enc(x / ) + f(y) from D x i for every x' G {0, l} fc for 
the lexicographically smallest w = Enc(x) + f(y) G T> k until we have removed exactly 
(q — 1 )K elements from V k in Step 4. 

5. Return to Step 2. 

Next, we describe how to construct / from Y, /|{o,i}™\y> and {(x y , a y , b y ) G {0, l} fc x [M] x [q\ : 
y G T}. We show how to recover the look-up table for / on values in Y. 

Recover-/: 

1. Choose the lexicographically smallest y G Y, and remove it from Y. Then, choose the 
lexicographically a y -th smallest element w from D(x y ). 

2. Simulate A on input w, and halt the simulation immediately after A makes the 6^-th query 
to Oy. Since the fe^-th query is f(y), add the entry (y, /(y)) to the look-up table. 

In what follows, we explain why we can correctly simulate A(w). 

• For any query y' to Oj, it must be either (1) y' ^Y or (2) y' is lexicographically smaller 
than y. In case (1), we can answer the query by using /|{o,i}™\y- In case (2), since y 
was chosen as the lexicographically smallest element such that A does not query Oj on 
y, the look-up table has the answer to the query. 

• Consider any of the first b y — 1 queries z' to OjK. If z' G /({0, l} m ), namely z' = f{y') 
for some y ', then it must be either (1) y' £Y or (2) y' is lexicographically smaller than 
y. In either case, the look-up table has the entry (y',z'). If z' £ /({0, l} m ), there is no 
entry for z' in the look-up table. Thus, we can answer the query by saying “yes” if z' is 
in the look-up table, and “no” otherwise. 

3. Return to Step 1. 

In each iteration in CONSTRUCT-y, we add one element to Y and remove exactly qK elements 
from T> k . Since initially the size of T> k is at least 5KM , the size of Y in the end is at least d = SM/q. 
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The set Y, the look-up table for /|{o,n™\yj the sets {{x y ,a y ,b y ) € {0, l} fc x [M] x [q] : y e Y} 
can be described using ( A j), log((^~^) (M — d)\), and d{k + m + log q) bits respectively. Therefore, 
the statement follows. □ 


We show that the fraction of / £ T for which / € forgeable and f(U m ) is correctable is small. 

Lemma 4. If m > 3logs + logn + 0(1) and m < n — k — 2logs — 0(1), then the fraction of 
functions f £ F such that f € forgeable and f(U m ) can be corrected by a pair of oracle circuits 
( Enc , Dec ) of total size s is less than 2"( snlogs+1 ) for all sufficiently large n. 

Proof. It follows from Lemma 3 that, given (Enc, Dec), the fraction is 


i f °rgeable| {Z-d) ( M ~ d ) [ 2 d(k+m+io R a) 

C)M'- “ (m) M! 



C qRM) d , 



(2) > (!)"> -d 


for all sufficiently large n. The last inequality follows from the fact that 


e 2 q 2 KM ^ e 2 q 2 ^ 1 
6N < S D(s 2 n) < 2 


and 


5M 6 ll(s 3 n) 
q q 


> ns log s + 1. 


□ 


We obtain the main result of this section. 

Theorem 1. For any m and k satisfying 3logs + logn + 0(1) < m < n — k — 2logs — 0(1), there 
exist injective functions f : {0, l} m —>• {0, l} n such that, given oracle access to Of, (1) f(U m ) is a 
samplable distribution with membership test of entropy m, and (2) f(U m ) cannot be corrected with 
rate k/n by oracle circuits of size s. 

Proof. Since correctf = invertible U forgeable, it follows from Lemmas 2 and 4 that for a fixed 
(Enc, Dec) of size s, the fraction of functions / € T such that (Enc, Dec) corrects f(U m ) with rate 
k/n is less than 2 - ( snlogs ). Since there are at most 2 snlogs circuits of size s, there are functions 
/ € J~ such that f(U m ) cannot be corrected with rate k/n by oracle circuits of size s. Given oracle 
access to Of, f(U m ) is samplable. Since / is injective, f(U m ) has entropy m. □ 

The following corollary immediately follows. 

Corollary 1. For any m and k satisfying ca(logn) < m < n — k — cu(logn), there exists an oracle 
relative to which there exists a samplable distribution with membership test of entropy m that cannot 
be corrected with rate k/n by polynomial size circuits. 
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5 Necessity of One-Way Functions 


We show that if one-way functions do not exist, then any samplable flat distribution of entropy 
m is correctable by an efficient coding scheme of rate 1 — m/n — 0(logn/n). For this, we use a 
technique used in the proof of [25, Theorem 6.3] that shows the necessity of one-way functions for 
separating pseudoentropy and compressibility. We observe that in its proof, a family of linear hash 
functions is used for giving an efficient compression function. Since a linear compression function 
is a dual object of a linear code that corrects additive errors, we can use a family of linear hash 
functions for constructing an efficient decoder. 

Definition 5 ([13]). We say a function f is distributionally one-way if it is computable in poly¬ 
nomial time and there exists a constant c > 0 such that for every probabilistic polynomial-time 
algorithm A, the statistical distance between (x,f(x)) and (A(f(x)),f(x)) is at least l/n c , where 
x ~ U n . 

Theorem 2 ([13]). If there is a distributionally one-way function, then there is a one-way function. 

Theorem 3. If one-way functions do not exist, then any samplable flat distribution Z over {0,1}" 
of entropy m can be corrected with rate 1 — m/n — (clogn)/n and error 0{n~ c ) for any constant 
c > 0 by polynomial-time coding schemes. 

Proof. Let Z = f(U m ) for an efficiently computable function /. Consider a family of linear universal 
hash functions P = {h : {0, l} n —>• {0, l} n+2clogn }, where the universality means that for any 
distinct x,x' £ {0, l} n , Prhen[h(x) = h(x')\ < 2~( m+2clogn \ and the linearity means that for 
any x,x' € {0, l} n and a,b £ {0,1}, h(ax + bx') = ah(x) + bh(x'). For each h £ P, we define 
Ch = {x € Supp(Z) : 3x r £ Supp(Z) s.t. i'/iA h{x) = h{x')}. Namely, Ch is the set of inputs 
with collisions under h. By a union bound, it holds that for any x £ Supp(Z), 

2 m I 

Prpx' £ Supp(Z) :x'^x A h(x') = h(x)} < 2m+2clogn = 

Thus, F’flC'/jl] < 2 m /n 2c . We say h £ P is good if \Ch\ < 2 m /n c . By Markov’s inequality, we have 
that Pr h(iH [\C h \ > 2 m /n c \ < l/n c . 

Consider the function g : {0, l} m xP Px {0, i} m + 21 °g n given by g(y, h) = (h, h(f{y))). Note 
that g is polynomial-time computable. By the assumption that one-way functions do not exist, and 
thus distributionally one-way functions do not exist, there is a polynomial-time algorithm A such 
that the statistical distance between ( y,h,g(y,h )) and (A(g(y,h)),g(y,h)) is at most n~ c , where 
V ~ Um and h £ P. Then, it holds that 

Pr \g(A(g(y, h))) = g{y , h)]> 1 - — 

A,y,h no¬ 

where the probability is taken over the random coins of A, y ~ U m , and h & P. Thus, we have that 

Pr \g{A(g(y, h))) = g(y, h) Ah is good] > 1--. 

A,y,h n c 

By fixing the coins of A and h £P, it holds that there are deterministic algorithm A’ and ho £ P 
such that ho is good and 

p *\g{A'(g{y, h 0 ))) = g{y, h 0 )} > 1 - —■ 
y n c 
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For y 6 {0, l} m satisfying g(A'(g(y,h 0 ))) = g{y,h 0 ), we write A'(g(y,h 0 )) = (; y',ti ), where 
M) = 2/ and A' 2 (g(y,h 0 )) = h!. Then, it holds that h! = h 0 and h 0 (f(y)) = h 0 (f(y')). 
Furthermore, since ho is good, Pr y [/(y) ^ Cfc 0 ] > 1 — l/n c . Let Ho € { 0 , l}( m+clo s n ) xn be a 
matrix such that xHq = ho(x) for x € {0,1}”. (Such matrices exist since % is a set of linear 
hash functions.) Consider a linear coding scheme in which Hq is employed as the parity check 
matrix, and A\ is employed for recovering errors from syndromes. That is, Enc(x) = xG for a 
matrix G G {0, l}( n_m_clogn ) xri satisfying GHq = 0, and Dec(y) = (y — /(A'^/io, yiL^)))G _1 , 
where G -1 € {0, l} nxfin is a right inverse matrix of G. Then, for any x € {0, l} m , 

Pr [Dec(Enc(x) + f{y)) = x] 
y~u r 

= Pr [Enc(x) + f(y) - f(A[(h 0 , (Enc(x) + f(y))H%)) = xG] 

y~Ur 

= p r [/(^'i(y(y^o))) = /(y)], 

yr^Ur 

where we use the property that GG ^ 1 = I, Enc(x) = xG, GHq = 0 , and xH^ = ho{x). Since the 
probability that g(Ao(g(y,ho))) = g(y,ho) is at least 1 — 2/n c , and for any y e {0, l} m satisfying 
g{A 0 (g(y,h 0 ))) = g(y,h 0 ), Pr y [f(y) C ho ] > 1 - l/n c , we have that 

p r [f( A i(g(y,ho))) = f(y )] > 1 - 

y~Um n c 


Hence the statement follows. □ 

6 Conclusions 

In this work, we study the correctability of samplable additive errors with unbounded error-rate. 
We have considered a relatively simple setting in which the error distribution is identical for every 
coding scheme and codeword. The results imply that even when a distribution is not pseudorandom 
by membership test, it is difficult to correct every such samplable distribution by efficient coding 
schemes. Nevertheless, a positive result can be obtained if we consider much more structured errors 
such as errors from linear subspaces. We present some possible future work of this study. 

Further study on the correctability. In this work, we have mostly discussed impossibility 
results. Thus, showing non-trivial possibility results is interesting. A possible direction is to 
consider more structured errors than samplable errors. One can consider computationally structured 
errors such as errors computed by log-space machines, constant-depth circuits, or monotone circuits. 
Also, one can consider other types of structures, e.g., errors are introduced in a split-state manner. 
Namely, an error vector is split into several parts, and each part is independently computed. 
This model has been well-studied in the context of leakage-resilient cryptography [6, 17] and non- 
malleable codes [7, 3, 1]. BSC can be seen as an extreme of this type of channels in which each 
error bit is computed by the same biased-sampler. 

Characterizing correctability. We have investigated the correctability of samplable additive 
errors using the Shannon entropy as a criterion. There may be another better criterion for char¬ 
acterizing the correctability of these errors, which might be related to efficient computability, to 
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which samplability is directly related. Since all the results in this paper deal with flat distributions, 
the results can be stated using other entropies such as the min-entropy. Since we have considered 
general distributions as error distributions, the information-spectrum approach [23, 11] may be 
more plausible. 
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